In modern low-power embedded platforms, floating-point (FP) operations emergeas a major contributor to the energy consumption of compute-intensiveapplications with large dynamic range. Experimental evidence shows that 50% ofthe energy consumed by a core and its data memory is related to FPcomputations. The adoption of FP formats requiring a lower number of bits is aninteresting opportunity to reduce energy consumption, since it allows tosimplify the arithmetic circuitry and to reduce the memory bandwidth betweenmemory and registers by enabling vectorization. From a theoretical point ofview, the adoption of multiple FP types perfectly fits with the principle oftransprecision computing, allowing fine-grained control of approximation whilemeeting specified constraints on the precision of final results. In this paperwe propose an extended FP type system with complete hardware support to enabletransprecision computing on low-power embedded processors, including twostandard formats (binary32 and binary16) and two new formats (binary8 andbinary16alt). First, we introduce a software library that enables explorationof FP types by tuning both precision and dynamic range of program variables.Then, we present a methodology to integrate our library with an external toolfor precision tuning, and experimental results that highlight the clearbenefits of introducing the new formats. Finally, we present the design of atransprecision FP unit capable of handling 8-bit and 16-bit operations inaddition to standard 32-bit operations. Experimental results on FP-intensivebenchmarks show that up to 90% of FP operations can be safely scaled down to8-bit or 16-bit formats. Thanks to precision tuning and vectorization,execution time is decreased by 12% and memory accesses are reduced by 27% onaverage, leading to a reduction of energy consumption up to 30%.
展开▼